NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Online Reinforcement Learning for Diffusion Policy

Ma, Haitong; Chen, Tianyi; Wang, Kai; Li, Na; Dai, Bo (July 2025, PMLR)

Diffusion policies have achieved superior performance in imitation learning and offline reinforcement learning (RL) due to their rich expressiveness. However, the conventional diffusion training procedure requires samples from target distribution, which is impossible in online RL since we cannot sample from the optimal policy. Backpropagating policy gradient through the diffusion process incurs huge computational costs and instability, thus being expensive and not scalable. To enable efficient training of diffusion policies in online RL, we generalize the conventional denoising score matching by reweighting the loss function. The resulting Reweighted Score Matching (RSM) preserves the optimal solution and low computational cost of denoising score matching, while eliminating the need to sample from the target distribution and allowing learning to optimize value functions. We introduce two tractable reweighted loss functions to solve two commonly used policy optimization problems, policy mirror descent and max-entropy policy, resulting in two practical algorithms named Diffusion Policy Mirror Descent (DPMD) and Soft Diffusion Actor-Critic (SDAC). We conducted comprehensive comparisons on MuJoCo benchmarks. The empirical results show that the proposed algorithms outperform recent diffusion-policy online RLs on most tasks, and the DPMD improves more than 120% over soft actor-critic on Humanoid and Ant.
more » « less
Full Text Available
What is the Right Notion of Distance between Predict-then-Optimize Tasks?

Rodriguez-Diaz, Paula; Kong, Lingkai; Wang, Kai; Alvarez-Melis, David; Tambe, Milind (July 2025, UAI 2025)

Full Text Available
Efficient Online Reinforcement Learning for Diffusion Policy

Ma, Haitong; Chen, Tianyi; Wang, Kai; Li, Na; Dai, Bo (June 2025, ICML)

Full Text Available
Opportunistic screening of type 2 diabetes with deep metric learning using electronic health records

https://doi.org/10.1038/s41598-025-25759-x

Jin, Qixuan; Zhang, Haoran; Szczerbinski, Lukasz; Zhu, Jiacheng; Gerych, Walter; Xu, Xuhai; Wang, Kai; Hsu, Sarah; Mandla, Ravi; Deutsch, Aaron J; et al (November 2025, Scientific Reports)

Full Text Available
Primal-Dual Spectral Representation for Off-policy Evaluation

Hu, Yang; Chen, Tianyi; Li, Na; Wang, Kai; Dai, Bo (May 2025, AIStats, PMLR: Volume 258.)

Full Text Available
Primal-Dual Spectral Representation for Off-policy Evaluation

Hu, Yang; Chen, Tianyi; Li, Na; Wang, Kai; Dai, Bo (May 2025, PMLR)
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)
Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.
more » « less
Full Text Available
What's in a Query: Polarity-Aware Distribution-Based Fair Ranking

https://doi.org/10.1145/3696410.3714660

Balagopalan, Aparna; Wang, Kai; Salaudeen, Olawale; Biega, Asia; Ghassemi, Marzyeh (April 2025, ACM)

Full Text Available
Metaoptic Computational Imaging

https://doi.org/10.1021/acsphotonics.4c02266

Roques-Carmes, Charles; Wang, Kai; Yang, Yuanmu; Majumdar, Arka; Lin, Zin (April 2025, ACS Photonics)

Full Text Available
Fully First-Order Methods for Linearly Constrained Bilevel Optimization

Kornowski, Guy; Padmanabhan, Swati; Wang, Kai; Zhang, Zhe; Sra, Suvrit (December 2024, NeurIPS 2024)

Full Text Available
Computational and analytical studies of a new nonlocal phase-field crystal model in two dimensions

https://doi.org/10.1142/S0218202524500441

Du, Qiang; Wang, Kai; Yang, Jiang (October 2024, Mathematical Models and Methods in Applied Sciences)

A nonlocal phase-field crystal (NPFC) model is presented as a nonlocal counterpart of the local phase-field crystal (LPFC) model and a special case of the structural PFC (XPFC) derived from classical field theory for crystal growth and phase transition. The NPFC incorporates a finite range of spatial nonlocal interactions that can account for both repulsive and attractive effects. The specific form is data-driven and determined by a fitting to the materials structure factor, which can be much more accurate than the LPFC and previously proposed fractional variant. In particular, it is able to match the experimental data of the structure factor up to the second peak, an achievement not possible with other PFC variants studied in the literature. Both LPFC and fractional PFC (FPFC) are also shown to be distinct scaling limits of the NPFC, which reflects the generality. The advantage of NPFC in retaining material properties suggests that it may be more suitable for characterizing liquid–solid transition systems. Moreover, we study numerical discretizations using Fourier spectral methods, which are shown to be convergent and asymptotically compatible, making them robust numerical discretizations across different parameter ranges. Numerical experiments are given in the two-dimensional case to demonstrate the effectiveness of the NPFC in simulating crystal structures and grain boundaries.
more » « less
Full Text Available

« Prev Next »

Search for: All records